Eigen-MLLR environment/speaker compensation for robust speech recognition

نویسندگان

  • Yuan-Fu Liao
  • Hung-Hsiang Fang
  • Chi-Hui Hsu
چکیده

However, it is generally difficult if not impossible to prepare a complete set of a priori noisy environment knowledge. Especially, the noisy environments which are not seen in the training phase (aka unseen noisy environment) may become potential sources of serious performance degradation for those non-blind methods. In other words, it is crucial how to well organize and efficiently utilize the a priori noisy environment knowledge. To efficiently take the advantage of a priori noisy environment knowledge and, at the same time, alleviate the problem of unseen noisy environments and speakers, an Eigen-MLLR method originally proposed only for speaker adaptation [7-8] is adopted in this paper. In this paper an eigen-maximum likelihood linear regression (Eigen-MLLR) method is proposed to utilize a set of a priori noisy environment/speaker knowledge to online compensate the characteristics of unknown test environment/speaker. This idea is straightforward but is motivated from our recent findings that both the characteristics of different kinds of noisy environments and speakers could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed EigenMLLR subspace are highly related to the SNR value, gender and type of noise. The proposed Eigen-MLLR was evaluated on Aurora 2 multi-condition training task. Experimental results showed that average word error rate (WER) of 6.14% was achieved. Moreover, Eigen-MLLR not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches. This idea is straightforward (from the viewpoint of speaker adaptation) but is motivated from our recent finding [5] that both the characteristics of different kinds of noisy environments (represented by a set of MLLR super-matrices) could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed Eigen-MLLR subspace are highly related to the SNR value, gender and type of noise. It is therefore possible to (1) analyze a set of environment/speaker characteristics collected from all seen noisy environments/speakers in the training phase to construct an Eigen-MLLR environment/speaker subspace, and to then (2) optimally estimate (in the sense of maximum likelihood) the characteristics of the unknown test noisy environment/speaker in the test phase to compensate the HMMs of the ASR engine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation

This paper investigates the use of maximum likelihood linear regression (MLLR) for both speaker and environment adaptation. MLLR transforms the mean and variance parameters of a set of HMMs. In this paper a number of different types of linear transformations of the variances are examined including full, block diagonal, and diagonal transformation matrices. Experiments on large vocabulary speake...

متن کامل

Mean and variance adaptation within the MLLR framework

One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker speciic data, and are often based on initial speaker independent (SI) recognition systems. Some of these speaker adaptation techniques may also...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008